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ABSTRACT 

Modularity is widely used to effectively measure 
the strength of the community structure found by 
community detection algorithms. However, modu¬ 
larity maximization suffers from two opposite yet 
coexisting problems: in some cases, it tends to favor 
small communities over large ones while in others, 
large communities over small ones. The latter ten¬ 
dency is known in the literature as the resolution 
limit problem. To address them, we propose to mod¬ 
ify modularity by subtracting from it the fraction 
of edges connecting nodes of different communities 
and by including community density into modularity. 
We refer to the modified metric as Modularity Den¬ 
sity and we demonstrate that it indeed resolves both 
problems mentioned above. We describe the motiva¬ 
tion for introducing this metric by using intuitively 
clear and simple examples. We also prove that this 
new metric solves the resolution limit problem. Ei- 
nally, we discuss the results of applying this metric, 
modularity, and several other popular community 
quality metrics to two real dynamic networks. The 
results imply that Modularity Density is consistent 
with all the community quality measurements but 
not modularity, which suggests that Modularity Den¬ 
sity is an improved measurement of the community 
quality compared to modularity. 


I INTRODUCTION 

Communities are the basic structures in sociology in 
general and in social networks in particular. They 
have been intensively researched for more than a half 
of the century [1]. Community in sociology usually 
refers to a social unit whose members share common 
values and the identity of the members as well as 
their degree of cohesiveness depend on individuals’ 
social and cognitive factors such as beliefs, prefer¬ 
ences, or needs. The ubiquity of the Internet and 
social media eliminated spatial limitations on com¬ 
munity geographical range, enabling on-line commu¬ 
nities to link people regardless of their physical loca¬ 
tion. The newly arising computational sociology re¬ 


lies on computationally intensive methods to analyze 
and model social phenomena [2] , including communi¬ 
ties and their detection. 

Analysis of social networks became one of the basic 
tools of sociology [3] and has been used for linking 
micro and macro levels of sociological theory. The 
classical example of the approach is presented in [4] 
that elaborated the macro implications of one as¬ 
pect of small-scale interaction, the strength of dyadic 
ties. Moreover, a lot of commercial applications, such 
as digital marketing, behavioral targeting, and user 
preference mining, rely heavily on community analy¬ 
sis. With the rapid growth of large-scale on-line social 
networks, e.g., Eacebook connected a billion users in 
2012, there is a high demand for efficient community 
detection algorithms that will be able to handle their 
evolution growth. Communities in on-line social net¬ 
works are discovered by analyzing the observed and 
often recorded on-line interactions between people. 

In computational sociology, communities are defined 
as groups of nodes in a social network within which 
connections are denser than between them [^. This 
definition has been found useful also in other type 
of networks, and community detection became one 
of the fundamental issues in network science. Com¬ 
munity detection has been shown to reveal latent 
yet meaningful structure not only for groups in on¬ 
line and contact-based social networks, but also in 
groups of customers with similar interests in online 
retailer user networks, groups of scientists in inter¬ 
disciplinary collaboration networks, and in biology 
in functional modules in protein-protein interaction 
networks etc. [6]. Since in most applications the real 
communities are not known (often due to the cost of 
establishing ground truth in large on-line social net¬ 
works), there is a need for developing reliable metrics 
to evaluate detected communities, so these metrics 
can be used to rank the quality of community struc¬ 
tures discovered by different community detection al¬ 
gorithms. Such metrics can also be used to develop 
novel community algorithms that iteratively attempt 
to improve the metrics by merging or splitting the 
given network community structure. 
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In the last decade, the most popular community de¬ 
tection method, proposed by Newman [7], has been 
to maximize the quality metric known as modular¬ 
ity [SlIH] over all the possible partitions of a network. 
This metric measures the difference (relative to the 
total number of edges) between the actual and ex¬ 
pected (in a randomized graph with the same number 
of nodes and the same degree distribution) number of 
edges within a given community. It is widely used to 
measure the strength of the community structures de¬ 
tected by the community detection algorithms. How¬ 
ever, modularity maximization has two opposite yet 
concurrent problems. In some cases, it tends to split 
large communities into smaller communities. In other 
cases, it tends to form large communities by merging 
communities that are smaller than a certain threshold 
which depends on the total number of edges in the 
network and on the degree of inter-connectivity be¬ 
tween the communities. The latter problem is known 
as the resolution limit problem [3]. 

To solve these two problems simultaneously, we pro¬ 
pose a new community quality metric, that we termed 
Modularity Density, as an alternative to modularity. 
First, we show modularity decreased by Split Penalty, 
defined as the fraction of edges that connect nodes of 
different communities, solves the problem of favoring 
small communities. Next, we demonstrate that in¬ 
cluding community density into modularity addresses 
the problem of favoring large communities. We refer 
to the resulting metric as Modularity Density. 

We formally prove that Modularity Density could re¬ 
solve the resolution limit problem. We also discuss 
our experiments with this metric, modularity, and 
other popular community quality metrics, including 
the number of Intra-edges, Contraction, the num¬ 
ber of Inter-edges, Expansion, and Conductance [IQ!, 
on two real dynamic networks. The results show 
that Modularity Density is different from original 
modularity, but consistent with all those community 
quality measurements, which implies that Modular¬ 
ity Density is effective in measuring the community 
quality of networks. 

The rest of the paper is organized as follows. First, in 
Section El we discuss some related works. Then, we 
briefly introduce modularity and illustrate our mo¬ 
tivation to propose the new metric with examples in 
Section imi Section lTVl presents the formal proofs and 
the experiments that demonstrate Modularity Den¬ 
sity solves the two problems of modularity simulta¬ 
neously. Finally, we conclude and discuss the future 
work in Section [V] 


II RELATED WORK 

Community detection in complex networks has re¬ 
ceived a considerable amount of attention in the last 
years. Numerous techniques have been developed for 
both efficient and effective community detection, in¬ 
cluding Modularity Optimization [TiiHiinHinj, Clique 
Percolation [Min], Local Expansion [TSISn] . Fuzzy 
Clustering Emm, Link Partitioning E3], and La¬ 
bel Propagation ElHlSj- The above algorithms are 
designed to detect communities on static networks. 
However, networks, such as Internet and online social 
networks, are usually dynamic, with changes arriv¬ 
ing as a stream. Thus, a large number of algorithms 
were proposed to cope with community detection on 
dynamically evolving networks, such as LabelRankT 
m and Estrangement El] ■ LabelRankT El] detects 
communities in large-scale dynamic networks through 
stabilized label propagation. Estrangement El] de¬ 
tects temporal communities by maximizing modular¬ 
ity in a snapshot subject to a constraint on the es¬ 
trangement from the partition in the previous snap¬ 
shot. 

In addition to the development of algorithms for com¬ 
munity detection, several metrics for evaluating the 
quality of community structure have been introduced. 
The most popular and widely used is modularity [5 1 8) . 
It is defined as the difference (relative to the total 
number of edges) between the actual and expected (in 
a randomized graph with the same number of nodes 
and the same degree sequence) number of edges inside 
a given community. Although initially defined for un¬ 
weighted and undirected networks, the dehnition of 
modularity has been subsequently extended to cap¬ 
ture community structure in weighted networks |29| 
and then in directed networks [30] . 

However, recently. Fortunate and Barthelemy [9] 
presented a resolution limit problem of modularity, 
essence of which is that optimizing modularity will 
not find communities smaller than a threshold size, 
or weight m- This threshold depends on the to¬ 
tal number, or total weight, of edges in the net¬ 
work and on the degree of interconnectedness be¬ 
tween the communities. Moreover, Good et al. |32] 
shown that the range of modularity values computed 
over all possible partitions of a graph has a struc¬ 
ture in which the maximum modularity partition is 
typically concealed among an exponentially large (in 
terms of the graph size) number of structurally dis¬ 
similar, high-modularity partitions. To address this 
resolution limit problem, multi-resolution versions of 
modularity E51IM] were proposed to allow researchers 
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(a) Two very well separated communities. 



I One community j 

(c) Two weakly connected communities. 




(d) Ambiguity between one and two communities. 




(e) One well connected community. (f) One very well connected community. 

Figure 1: Six simple network examples that have two different community structures, one with a single big 
community containing all eight nodes and the other with the two small communities each containing four 
different nodes. 


to specify a tunable target resolution limit parameter 
and identify communities on that scale. Typically, it 
is not clear how to choose the correct value for this 
parameter. Furthermore, Lancichinetti and Fortu¬ 
nate [35] stated that even those multi-resolution ver¬ 
sions of modularity as well as its original version are 
not only inclined to merge the smallest well-formed 
communities but also to split the largest well-formed 
communities. In contrast, the Modularity Density 
metric we propose here solves those two problems of 


modularity without the trouble of specifying any par¬ 
ticular parameter. 

Ill MODULARITY DENSITY 

In this section, we first formally introduce Newman’s 
definition of modularity and then illustrate the moti¬ 
vation for modifying modularity with several simple 
network examples. Next, we propose a new commu- 
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Table 1: Metric values of the example: Two very well separated communities. 



Modularity (Q) 

Split Penalty (SP) 

Qs 

Qds 

Two communities 

0.5 

0 

0.5 

0.5 

One community 

0 

0 

0 

0.245 


Table 2: Metric values of the example: Two well separated communities. 



Modularity (Q) 

Split Penalty (SP) 

Qs 

Qds 

Two communities 

0.357 

0.143 

0.214 

0.339 

One community 

0 

0 

0 

0.25 


Table 3: Metric values of the example: Two weakly connected communities. 



Modularity (Q) 

Split Penalty (SP) 

Qs 

Qds 

Two communities 

0.3 

0.2 

0.1 

0.263 

One community 

0 

0 

0 

0.249 


nity quality metric, called Modularity Density, as an 
alternative to modularity by combining modularity 
with Split Penalty and community density to avoid 
the two coexisting problems of modularity. Finally, 
we define Modularity Density for different kinds of 
networks, including unweighted and undirected net¬ 
works, weighted networks, and directed networks, 
based on the corresponding formulas of modularity. 


1 NEWMAN’S MODULARITY 


Modularity for unweighted and undirected net¬ 
works is defined as the ratio of difference between the 
actual and expected (in a randomized graph with the 
same number of nodes and the same degree sequence) 
number of edges within the community. For the given 
community partition of a network G = {V, E) with 
\E\ edges, modularity (Q) [5] is given by 


Q=I 1 

CiGC 


'\Ei:\ 

/2|E“| + |E°f|y 

. 1^1 

1 2|E| J _ 


( 1 ) 


where C is the set of all the communities, Ci is a spe¬ 
cific community in C, |E™| is the number of edges 
between nodes within community Ci, and \E°'^*\ is 
the number of edges from the nodes in community Ci 
to the nodes outside c^. 


The definition of modularity [29] for the weighted net¬ 
works has precisely the same formula, Equation o, 
as for the unweighted and undirected networks. How¬ 
ever, for weighted networks, \E\ is the sum of the 
weights of all the edges in the network, IE*"! is the 
sum of the weights of the edges between nodes within 
community Ci, and |E°“*| is the sum of the weights 
of the edges from the nodes in community Ci to the 
nodes outside Ci. 


The formula of modularity for directed networks I3Q] 
is as follows 


0= E 


CiGC 


\Ei:\ (|E™| + \Eout,cM\Ei:\ + \E,,^out\) 

L \E\^ 


( 2 ) 

where \Eout,ci \ is the number of edges from the nodes 
outside Ci to the nodes in Ci and \Eci,out \ is the num¬ 
ber of edges from the nodes in Ci to the nodes out¬ 
side Ci. For undirected networks, it is clear that 
\Eout,Ci\ = \Eci,out\ = |E°“*| and thus the directed 
modularity is reduced to undirected modularity. 


2 MOTIVATION FOR INTRODUCING 
SPLIT PENALTY 

In this subsection, we demonstrate the motivation for 
introducing Split Penalty into modularity by using 
seven intuitively clear and simple network examples, 
six of which are presented in Figure |TJ The seventh 
example is a complete graph with eight nodes and one 
big community containing all eight nodes while the 
alternative partition consists of the two small com¬ 
munities each containing four different nodes. We 
could easily judge that for the first, second, and 
the third examples, the community structure with 
two small communities is better than the community 
structure in which they are merged together. For the 
fourth example, the two different community struc¬ 
tures are nearly of the same quality. However, for 
the fifth, sixth, and the seventh examples, the com¬ 
munity structure with one big community is of better 
quality than the alternative. 

Tables [Tim show the metric values of the seven net¬ 
work examples described above. Tables wm and Ta¬ 
ble III demonstrate that modularity succeeds in mea- 
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Table 4: Metric values of the example: Ambiguity between one and two communities. 



Modularity {Q) 

Split Penalty {SP) 

Qs 

Qds 

Two communities 

0.25 

0.25 

0 

0.188 

One community 

0 

0 

0 

0.245 


Table 5: Metric values of the example: One well connected community. 



Modularity {Q) 

Split Penalty {SP) 

Qs 

Qds 

Two communities 

0.167 

0.333 

-0.167 

0.0417 

One community 

0 

0 

0 

0.23 


Table 6: Metric values of the example: One very well connected community. 



Modularity {Q) 

Split Penalty {SP) 

Qs 

Qds 

Two communities 

0.0455 

0.455 

-0.409 

-0.239 

One community 

0 

0 

0 

0.168 


Table 7: Metric values of the example: One Complete Graph. 



Modularity {Q) 

Split Penalty {SP) 

Qs 

Qds 

Two communities 

-0.0714 

0.571 

-0.643 

-0.643 

One community 

0 

0 

0 

0 


suring the quality of the two different community 
structures in those four examples. However, from 
Tables mm we could observe that modularity actu¬ 
ally fails to measure the community quality of those 
three examples because it implies that the commu¬ 
nity structure with two small communities is better. 
In contrast, for the fifth and the sixth examples, the 
community structure with one big community is of 
better quality. Yet, in this case modularity gives 
preference to the community structure with two sepa¬ 
rated small communities, demonstrating that modu¬ 
larity has the problem of favoring small communities. 

To address the drawback of favoring small communi¬ 
ties, we propose that the quality of the community 
structure should take into account the edges between 
different communities. We introduce Modularity with 
Split Penalty {Qs) by subtracting from modularity 
the Split Penalty (SP) which is the fraction of edges 
that connect nodes of different communities. More 
formally, 

Qs = Q- SP. (3) 

The intuition here is clear. Modularity measures the 
positive effect of grouping nodes together in terms 
of taking into account existing edges between nodes 
while Split Penalty measures the negative effect of 
ignoring edges joining members of different commu¬ 
nities. Enlarging community eliminates some Split 
Penalty but if there are only a few edges across cur¬ 
rent partition, modularity of the merged community 
could be lower, negating the benefit of merging. Split¬ 
ting a community into two or more communities in¬ 


troduces some Split Penalty but if there are only a 
few edges between those separated communities, an 
increase of modularity can make such splitting bene¬ 
ficial. Tables HEl demonstrate that Qs can correctly 
measure the quality of the community structures of 
all seven network examples. 


3 MODULARITY WITH SPLIT PENALTY 


In this subsection, we extend the formula of Qs 
to different kinds of networks, such as unweighted 
and undirected networks, weighted networks, and di¬ 
rected networks, based on the corresponding formulas 
of modularity presented in Subsection lllli m 


From Subsection imi we know that Split Penalty 
{SP) is the fraction of edges that connect nodes of dif¬ 
ferent communities. Thus, for undirected networks, 
no matter unweighted or weighted. Split Penalty is 
defined as 


sp=y. 



Ci^C Cj^C 

Cj^Ci 


\Pci,Cj I 

m _ 


( 4 ) 


where \Eci^cj\ is the number of edges from commu¬ 
nity Ci to community Cj for unweighted networks or 
the sum of the weights of the edges from community 
Ci to community Cj for weighted networks. For di¬ 
rected networks. Split Penalty is given by 


sp=y 



CiGC '-CjGC 

Cj=jtCi 


\Pci,Cj I 

\E\ _■ 


( 5 ) 
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Figure 2: Two simple network examples with the left one containing two clique communities and the right 
one containing two tree communities. Also, there are six edges within all four communities, but the number 
of nodes is different in clique and tree communities. 


Table 8: Metric values of the example: two clique communities vs two tree communities. 



Modularity (Q) 

Split Penalty (SP) 

Qs 

Qds 

Two clique communities 

0.4231 

0.07692 

0.3462 

0.4183 

Two tree communities 

0.4231 

0.07692 

0.3462 

0.2214 


It can be seen that for each community, the Split 
Penalty only takes into account the outgoing edges 
from this community to the rest of the network but 
not the incoming edges from the rest of the network 
to this community. It is reasonable to use only out¬ 
going edges, because in a sense those are friendships 
of community members. Incoming edges may not be 
apparent. Moreover, considering both outgoing and 
incoming edges would only double the value of Split 
Penalty because the incoming edges of a community 
are the outgoing edges of other communities. 

Therefore, for undirected networks, both unweighted 
and weighted, from Equations (P), (l3|), and o, is 
defined as 


Qs = Q-SP 



( 6 ) 

For directed networks, using Equations 0, m, 
and m, Qs can be expressed as 


4 MOTIVATION FOR INTRODUCING 
COMMUNITY DENSITY 

Modularity and also Qs have two shortcomings. 
First, they are independent of the number of nodes 
in the communities as long as the number of edges 
is preserved. Second, modularity has the resolution 
limit problem that Qs makes even worse. 

The first shortcoming is illustrated in Figure 2 with 
two simple networks. The left subfigure contains two 
clique communities and the right subfigure includes 
two tree communities. In each subfigure, there is one 
single edge that connects the two communities and 
there are six edges within all four communities but 
the number of nodes in clique communities is different 
from the number of nodes in tree communities. As 
shown in Table HI the values of modularity and Qs 
of those two different community structures are the 
same. However, it is quite obvious that the two clique 
communities have better community structure qual¬ 
ity than the two tree communities in terms of node 
connections. Moreover, this example shows that the 
number of nodes of the network and within the com¬ 
munities influences neither modularity nor Qs- 


Qs = Q-SP 

I A™ I 


Ci^C 


L \E\ 


{\E^J:\P\Eoui,cME^\ + \Ec,,out\) 

\E? 


-E 

c^eC 

Cj/Ci 


\E^ 

\E\ _■ 


(7) 


Second shortcoming, the resolution limit problem, is 
illustrated in Figure [3] It displays a ring network 
comprised of thirty identical cliques, each of which 
has five nodes and they are connected by single edges. 
In this case, the modularity of the community struc¬ 
ture with each clique forming a different community, 
totally thirty communities, should be larger than that 
of the community structure in which two consecu- 
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Figure 3: A ring network example made out of thirty identical cliques, each having five nodes and connected 
by single edges. 

Table 9: Metric values of the example: a ring of thirty cliques, each having five nodes and connected by 
single edges. 



Modularity (Q) 

Split Penalty {SP) 

Qs 

Qds 

Thirty communities 

0.8758 

0.09091 

0.7848 

0.8721 

Fifteen communities 

0.8879 

0.04545 

0.8424 

0.4305 


tive cliques form a different community, totally fifteen 
communities. However, Table |9] shows that the rela¬ 
tion is reversed since the community structure with 
fifteen communities has larger modularity than that 
of the community structure with thirty communities. 
Further, as pointed out in [9], when TO(m—1)-|-2 < n, 
where n is the number of cliques and m is the number 
of nodes in each clique, modularity is higher for the 
large community with two consecutive cliques instead 
of the small community with a single clique. More¬ 
over, Table |9] demonstrates that the difference of Qs 
for these two community structures is larger than the 
corresponding difference of modularity. More specif¬ 
ically, AQs = (0.8424 - 0.7848) = 0.0576 > AQ = 
(0.8879 — 0.8758) = 0.0121, which means that Qs 
makes the resolution limit problem even worse. 


demonstrate that Qds correctly measures the quality 
of the community structures of all seven network ex¬ 
amples. Even for the network example of Figure [1(d)] 
in which there is ambiguity which community struc¬ 
ture is of higher quality, the Qds of the one big com¬ 
munity is only slightly larger than the Qds of the two 
small communities as shown in Table IH 

5 MODULARITY DENSITY 

In this subsection, we will give the formulas for Qds 
for different kinds of networks, including unweighted 
and undirected networks, weighted networks, and di¬ 
rected networks, based on the corresponding formulas 
of Qs presented in Subsection IIIII I51 


To address the above two shortcomings, it is quite 
intuitive to introduce community density into modu¬ 
larity, incorporating both the number of edges and 
the number of nodes in the communities and also 
Split Penalty. The corresponding new metric is called 
Modularity Density {Qds)- Table [S] implies that the 
Qds of the two tree communities is almost half of 
the Qds of the two clique communities. Moreover, 
Table [9] shows that the Qds of the community struc¬ 
ture in which two consecutive cliques form a different 
community is almost half of the Qds of the alterna¬ 
tive in which each clique forms a different commu¬ 
nity. Hence, in this case, Qds avoids the resolution 
limit problem. Furthermore, Tables [T][7] and Figure [1] 


For undirected networks, regardless whether un¬ 
weighted or weighted, we define Qds using Equa¬ 
tion m as follows 


Qds — ^ ^ 


CiGC 


ra. 

\E\ - 


^\Ei:\ + [Eg 
2|e;| 


-E 


\Ec. 


Cj gC 
cj^a 


2\E\ 


dci — 
d‘Ci,cj 


2|f;™| 

|o.|(|c,f-l)’ 

\Ec„c,\ 

I Ci 11 Cj I 


( 8 ) 
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In the above, is the internal density of community 
Ci, dci.cj is the pair-wise density between community 
Ci and community Cj. Note that |£'*”| in and 
\Eci,cj \ in dci^cj are unweighted for both unweighted 
and weighted networks, so that those two community 
densities are always less than or equal to 1.0. 

For directed networks, using Equation 0, Qds is 
given by 


Qds — 'y ^ 

Ci£C 


\Ec 


V \E\ 

m^\ + \Eout,cMEi 


\E? 


- Y. 


Cj €C 
cj^a 


\E\ 


Ci,Cj 


dc. = 


\Ec 


dci .c-i — 


Ci\{\Ci\ - 1 )' 
\Ec.,c,\ 


\Ci\\Cj\ 


I Eq^ ^out I) j2 


(9) 


IV EVALUATION AND ANALYSIS 

In this section, we first prove that Modularity Den¬ 
sity (Qds) solves the resolution limit problem. Then, 
we introduce two real dynamic datasets and various 
other popular community quality measurements. Fi¬ 
nally, we show the experimental results that validate 
Qds ability to solve the two problems of modularity 
(Q) simultaneously. 


1 PROOF OF SOLVING RESOLUTION 
LIMIT PROBLEM 

In this subsection, we test Modularity Density (Qds) 
on the examples from Fortunato and Barthelemy [^. 
First, we prove that Qds does not divide a clique into 
two or more parts. Then, we verify that Qds will not 
merge two or more adjacent cliques connected with a 
single edge. Finally, we prove that Qds can discover 
communities with different sizes. 

Modularity Density {Qds) does not divide a 
clique into two or more parts. Given a clique 
with m (m > 3) nodes, we prove that maximizing 
Qds does not divide this clique into two parts. Con¬ 
sider an arbitrary partition P that divides the clique 
into communities ci and C 2 with the number of nodes 
mi and m 2 , respectively. Then, the number of edges 
between ci and C 2 is mim 2 - Let Qds{single) be the 


Qds of the whole clique and Qdsipaifs) be the Qds of 
partition P. By definitions. 


Qds{single) = 0, 

(mi — m 2 )^ — m 


Qds {pairs) = 


m(m — 1) 


then. 


Qds{pairs)-Qds{single) = 


—2mim2 — 2mim2m 


< 0 . 


nrp{m — 1) 

Hence, Qds will not divide a clique into two parts. A 
simple generalization of this proof demonstrates that 
Qds will not divide a clique into three or more parts. 


Modularity Density {Qds) does not merge two 
or more consecutive cliques in the clique struc¬ 
ture ring network. Given a network, see Fig¬ 
ure SK a), comprised of a ring of n (where n > 2 is an 
even integer) cliques connected through single edges. 
Each clique is a complete graph with m (m > 3) 
nodes and m(m—1)/2 edges. Then, the cycle network 
has a total of nm nodes and nm{m — I)/2 -|- n edges. 
It is clear that the ring network has a well-formed 
community structure where each community corre¬ 
sponds to a single clique. However, this community 
structure cannot be obtained by maximizing mod¬ 
ularity [5] since the community structure with n/2 
communities of two adjacent cliques each has higher 
modularity. We prove that maximizing Qds finds the 
right community structure. We let Qds{single) be the 
Qds of the community structure in which each clique 
is a different community, totally n communities, and 
Qds{pairs) be the Qds of the community structure 
with two consecutive cliques forming a different com¬ 
munity, totally n/2 communities. By definitions. 


Qds{single) = 


i{m — I) I 


Qds{pairs) = 


m{m — 1)2 n m^{m — 
[m{m — I) + I]^ 

[m{m — I) -I- 2] [m(2rn — I)] 

2 [m{m — I) -f I]^ I 


n[TO(2m-I)]^ 4m3(7n- 

We need to prove the inequality 


2 

I) -|- 2l7p ’ 


I) + 8m^ ’ 


Qds{pairs) < Qds{single). (10) 

The first term of Qds{pairs) can be rewritten as 

[m{m — I) -I- I]^ rn^ — 2m^ + 3m? — 2m I 

[m{m — I) -|- 2][m(2rn — I)] m{m? — m-\- 2){2m — I) 

Then, the first and third terms of Qds{single) with 
the latter combined with the last term of Qds{pairs) 
yield 

m^ — m 7 m'^ — m^ — 1.75 

— TO -|- 2 4to^(to^ — to + 2) to^(to^ — to -I- 2) 
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Figure 4: Two clique structure network examples, (a) A clique structure ring network. There are totally n 
(where n is an even positive integer) cliques. Each clique contains m {m > 2>) nodes, and two consecutive 
cliques are connected by a single edge, (b) A network with two pairs of identical cliques. One pair of cliques 
have m (to > 4) nodes, and the other pair of cliques have p (3 < p < m) nodes. 


Combining all these terms, we get 

—TO® + TO^ + 2to® — 2m? + 4.5to — 1.75 
^2(to2 — 2)(2to — 1) 

We move the remaining two terms to the right hand 
side of Inequality m that we are proving getting 

1 —2to^ + 5to^ — 4to + 2 
n m‘^(2m — 1)^ 

Multiplying both sides by —to^(2to — 1) (and chang¬ 
ing direction of inequality) we get 

TO® — — 2m? + 2m? — 4.5to -|- 1.75 

m? — m + 2 

1 TO^ — 2.5to^ -|- 2to — 1 
4n TO — 0.5 


By doing divisions on both sides, we get 


TO® — 4to — 2 + 


1.5TO-h5.75 
TO^ — TO -|- 2 


1 

4n 


TO® -h 0.5 to2 - 2.25to -h 0.875 - —-- 

loTO — 8 


Since 

9 


16m —8 


^r^Zt ^+2 > 0, and 4 ^ < i for n > 2 and also 
> 0, we just need to show that 


„ TO®-h 0.5 to2 - 2.25to- h 0.875 
TO - 4 to - 2 > --- 


which simplifies to 

7to® — 0.5to^ — 29.75to — 16.875 > 0 for to > 3, 


which is easy to prove either by induction, starting 
at TO = 3, or by inspecting zeros of the derivative 


21to^ — to — 29.75, which are all less than 2.0, show¬ 
ing that this polynomial is positive for to > 3. 

Since Inequality (fTOjl holds, Qds will not merge two 
consecutive cliques in the ring network. A straight¬ 
forward extension of the proof shows that Qds will 
not merge three or more consecutive cliques. 


Modularity Density (Qds) could discover com¬ 
munities with different sizes. Consider a net¬ 
work, shown in Figure [UJb), with two pairs of identi¬ 
cal cliques. The left pair of cliques have m (to > 4) 
nodes, and the right pair of cliques have p (3 < p < 
to) nodes. This network has 2m -|- 2p nodes and 
to(to — 1) -I- p(p — 1) -I- 4 edges. It is obvious that 
each of the four cliques should be a different commu¬ 
nity. However, the authors in [9] found that max¬ 
imizing modularity will merge the right two small 
cliques. Here, we prove that maximizing Qds will not 
merge them. We let Qds{single) denote the Qds of 
the community structure in which each clique corre¬ 
sponds to a single clique, and Qdsipairs) be the Qds 
of the community structure with the right two small 
cliques merged into one community. Clearly, the Qds 
of the left two large cliques will stay the same in those 
two different community structures so we denote it as 
Q(is(0). By definitions, 


Q dsi^^i'^g^Q — f5rfs(0) 


pip - 1 ) 


to(to — 1) -I- pip — I) -I- 4 


[p(p- 1) + 2]" 


2 [to(to — 1) -I- p(p — I) -|- 4] 

I 

TOP [to(to — I) -|- p(p — 1) -|- 4] 
1 

p2 [to(to — 1) + p(p — 1) -I- 4] ’ 
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Qdsipairs) = Qds{0) 


1 

mp [m(rn — 1) + p(p — 1) + 4] 
_ [p{p - 1) + 1]^ [p{p - 1) + 2]^ 

p^ {2p — 1)2 [rn(m — 1) + p{p — 1) + 4]^ 

_ [pjP - 1) + 1]^_ 

p{2p — 1) [m{m — 1) + p{p — 1) + 4] ’ 


The inequality that we need to prove is 

Qds{single) - Qds{pairs) > 0. (11) 


Since 

Qds{single) - Qdsipairs) 

[p(p-l) + 2P [p(p-l) + l|a 

2[m(m — 1) + p{p — 1) + 4] p(2p — 1) 
[p(p-l) + l]^b(p-l) + 2p ] 
p2(2p — \Y[m{m — 1) + pip — 1) + 4] J ’ 

it is clear that the first factor is always positive so it 
can be removed from consideration and the interior 
of the second factor can be rewritten as 


ijP^ -P)+ 


2[p^ - p + lY[p^ - p + 2Y -[p^ -p + 2Yp^{2p - 1) 
2p2(2p — \Y[m? — m+p^—p + A\ 


> 


[p2 -p + iY 


1 


p2 p{2p — 1) 


The second term simplifies to 

[p^ — p + 2Y 2p‘^ — 5p'^ + 4p — 2 

2 p2(2p — 1)2 [777,2 _ 772 -|_ p2 _ p _|_ 4 J 


Since by induction for p > 3 the polynomial 2p^ — 
5p^ + 4p — 2 is positive, then this term is greater than 

(p^ — p + 2)(2p'^ — 5p^ + 4p — 2) 

4p2(2p- 1)2 


1 

4 


-0.5p2 + 0.375 


7.5p3 - 15.625p2 + lOp - 4 
4p4 — 4p3 + p2 


It is easy to show that the last fraction is less than 
0.391 by using induction or by finding zeros of the 
fraction derivative, which are all less than 2.5, so we 
just need to prove that 0.875p^ — P — 0.004 is greater 
than the right hand side of Inequality (ED. 


because 0.875p > 1 and p^ > p for p > 2. 

Since ^ < 0.12, the inequality that we need to 
prove reduces to 0.375p^ — 0.25p > 1.249, but for 
p > 3, 0.375p^ — 0.25p > 2.625, proving Inequality 
ED- Thus, we conclude that maximizing Qds will 
not merge the right two small cliques, demonstrating 
that Qds can discover communities of different size. 

In summary, all the above proofs show that Modu¬ 
larity Density solves the resolution limit problem of 
modularity. 


2 REAL DYNAMIC DATASETS 

In this subsection, we introduce two real dynamic 
datasets on which we conduct experiments in order to 
validate Qds avoids the two problems of modularity. 

Senate Dataset [28l[36]. The Senate dataset is a 
time-evolving weighted network comprised of United 
States senators where the weight of an edge repre¬ 
sents the similarity of their roll call voting behavior. 
This dataset was obtained from website voteview. com 
and the similarities between a pair of senators were 
calculated following Waugh et al. [36] as the num¬ 
ber of bills for which the senators of the pair voted 
the same way, normalized by the number of bills for 
which they both voted. The dataset totally consists 
of 111 snapshots corresponding to Senate’s activities 
over 220 years and includes 1916 unique senators. 

Reality Mining Bluetooth Scan Data [37]. This 
dataset was created from the records of Bluetooth 
Scans generated among the 94 subjects in Reality 
Mining study conducted from 2004-2005 at the MIT 
Media Laboratory. In the network, nodes represent 
the subjects and the directed edges correspond to the 
Bluetooth Scan records while the weight of each edge 
represents the number of direct Bluetooth scans be¬ 
tween the two subjects. In the experiments, we only 
used the records from August 02, 2004 (Monday) to 
May 29, 2005 (Sunday) and we divided them into 
weekly snapshots, so each snapshot represents scans 
collected during the corresponding week. There are 
total of 43 snapshots. 


The second term of the right hand side of Inequality 
ED can be rewritten as 


p'^ — 2p^ 3p^ — 2p -I- 1 


2(p2 - 0.5p) 
0.875p - 1 


2(p2 - 0.5p) 


< 0.5p^ 


= 0.5p2-0.75p-h 1.125 
-0.75p-f 1.125, 


3 COMMUNITY QUALITY MEASURE¬ 
MENTS 

In the discussion of the experimental results we use 
various community quality metrics, including the 
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Table 10; The average metric differences between LabelRankT with different values of conditional update 
parameter q and Estrangement on Senate dataset. 


LabelRankT q 

0.05 

0.1 

0.2 

0.3 

0.4 

0.5 

0.6 

0.7 

0.8 

0.9 

0.95 

Q 

-0.0534 

-0.0462 

-0.0408 

-0.0538 

-0.0714 

-0.0848 

-0.083 

-0.0897 

-0.0897 

-0.0848 

-0.08 

Qs 

-0.166 

-0.0802 

0.0468 

0.0808 

0.0969 

0.112 

0.116 

0.115 

0.115 

0.111 

0.106 

Qds 

-0.1638 

-0.0787 

0.04847 

0.08297 

0.0995 

0.1145 

0.1182 

0.1183 

0.1183 

0.1135 

0.1083 

^ Intra-edges 

-159.102 

-32.444 

234.296 

387.38 

510.645 

616.855 

615.123 

624.764 

624.764 

602.627 

580.733 

Contraction 

-6.806 

-3.023 

2.481 

4.553 

5.937 

7.033 

7.065 

7.227 

7.227 

6.927 

6.622 

^ Inter-edges 

-75.962 

-54.098 

-123.898 

-187.99 

-245.198 

-299.356 

-300.108 

-303.043 

-303.043 

-292.782 

-282.442 

Expansion 

6.448 

2.91 

-2.428 

-4.416 

-5.737 

-6.847 

-6.878 

-7.009 

-7.009 

-6.724 

-6.431 

Conductance 

0.213 

0.0851 

-0.0886 

-0.148 

-0.186 

-0.214 

-0.216 

-0.224 

-0.224 

-0.213 

-0.201 


Table 11: The average metric differences between LabelRankT with different values of conditional update 
parameter q and Estrangement on reality mining bluetooth scan data. 


LabelRankT q 

0.05 

0.1 

0.2 

0.3 

0.4 

0.5 

0.6 

0.7 

0.8 

0.9 

0.95 

Q 

-0.161 

-0.121 

-0.0783 

-0.0744 

-0.0724 

-0.0699 

-0.0702 

-0.0724 

-0.0742 

-0.0755 

-0.0774 

Qs 

-0.379 

-0.244 

-0.107 

-0.0802 

-0.0538 

-0.0497 

-0.0382 

-0.0405 

-0.0521 

-0.0634 

-0.0713 

Qds 

-0.191 

-0.0984 

-0.0222 

-0.017 

-0.0116 

-0.0116 

-0.00318 

-0.00826 

-0.011 

-0.0115 

-0.0134 

^ Intra-edges 

-1450.893 

-956.006 

-479.377 

-331.371 

-230.261 

-183.536 

-102.94 

-78.93 

-155.183 

-242.287 

-333.419 

Contraction 

-86.909 

-69.914 

-52.543 

-46.371 

-43.176 

-40.567 

-35.948 

-36.425 

-38.006 

-41.277 

-45.425 

^ Inter-edges 

-39.949 

-76.524 

-159.74 

-167.333 

-190.947 

-190.865 

-196.098 

-193.123 

-188.708 

-179.653 

-178.96 

Expansion 

52.529 

25.829 

6.289 

5.76 

5.664 

7.07 

4.881 

6.799 

6.916 

6.117 

5.669 

Conductance 

0.23 

0.176 

0.114 

0.1 

0.0934 

0.0933 

0.0843 

0.0955 

0.102 

0.107 

0.104 


number of Intra-edges, Contraction, the number of 
Inter-edges, Expansion, and Conductance [10) . which 
characterize how community-like is the connectivity 
structure of a given set of nodes. All of them rely 
on the intuition that communities are sets of nodes 
with many edges inside them and few edges outside 
of them. Now, given a network G = {V, E) and given 
a community or a set of nodes c, let |c| be the num¬ 
ber of nodes in the community c and let lA*”! denote 
the total number of edges in c for unweighted net¬ 
works or the total weight of such edges for weighted 
networks. We denote the total number of edges from 
the nodes in community c to the nodes outside c for 
unweighted networks or the total weight of such edges 
for weighted networks as Then, the dehnitions 

of the five quality metrics are as follows: 

The number of Intra-edges: it is the to¬ 

tal number of edges in c or the total weight of such 
edges. A large value of this metric is better than a 
small value in terms of the community quality. 
Contraction: 2|£'*”|/|c| for undirected networks or 
|E™|/|c| for directed networks; it measures the aver¬ 
age number of edges per node inside the community c 
or the average weight per node of such edges. A large 
value of Contraction is better than a small value in 
terms of the community quality. 


The number of Inter-edges: it is the total 

number of edges from the nodes in community c to 
the nodes outside c or the total weight of such edges. 
A small value of this metric is better than a large 
value in terms of the community quality. 
Expansion: |E°“*|/|c|; it measures the average num¬ 
ber of edges (per node) that point outside the commu¬ 
nity c or the average weight per node of such edges. A 
small value of Expansion is better than a large value 
in terms of the community quality. 

Conductance: undirected networks 

or for directed networks; it measures the 

fraction of the total number of edges that point out¬ 
side the community for unweighted networks or the 
fraction of the total weight of such edges for weighted 
networks. A small value of Conductance is better 
than a large value in terms of the community quality. 

4 EXPERIMENTAL RESULTS 

In this subsection, we report the results of per¬ 
forming community detection on the two real dy¬ 
namic datasets introduced in Subsection IIVI l2l by us¬ 
ing the dynamic community detection algorithms, 
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(a) Senate dataset (q = 0.7). (b) Reality Mining Bluetooth Scan data {q = 0.6). 

Figure 5: The modularity {Q) of the community detection results of LabelRankT and Estrangement (also, 
the difference between LabelRankT and Estrangement) on (a) each snapshot of Senate dataset at q = 0.7 
and on (b) each snapshot of Reality Mining Bluetooth Scan data with q = 0.6. 


LabelRankT ^7} and Estrangement [55]. We chose 
these two algorithms because the second algorithm 
relies on the modularity optimization while the first 
one does not. In the experiments, we adopted the 
best parameter of Estrangement but varying the con¬ 
ditional update parameter q G [0,1] of LabelRankT 
from 0.05 to 0.95. As seen in the results, in most 
cases, the best q is around 0.7 in agreement with 
the best value reported in m- For the community 
structures found by the two algorithms, we calculated 
the values of modularity (Q), Qs, Modularity Den¬ 
sity (Qds), and the five community quality metrics 
described in Subsection IIVII51 

Table ITOl and Table |TT] present the average metric dif¬ 
ferences between LabelRankT with different values of 
conditional update parameter q and Estrangement on 
Senate dataset and Reality Mining Bluetooth Scan 
data, respectively. That is, we first computed the 
values of the eight metrics above for the community 
detection results, detected by Estrangement, of each 
snapshot. Then, we calculated the eight metrics val¬ 
ues for the community detection results, discovered 
by LabelRankT for all q, of each snapshot. Next, 
we got the metric differences of all eight metrics by 
subtracting the metric values of Estrangement from 
those of LabelRankT for all q’s over each snapshot. 
Then, averaging those differences of each metric over 
all the snapshots, we obtained the corresponding av¬ 
erage metric differences. 

Table |T0| demonstrates that Q gets its largest value 
when q = 0.2; Qs reaches the largest value when 
q = 0.6; Qds, Intra-edges, and Contraction get their 
largest values at g = 0.7 and q = 0.8; also. Inter¬ 
edges, Expansion, and Conductance reach their small¬ 


est values at g = 0.7 and g = 0.8. Thus, Qds is 
consistent with the five metrics introduced in Subsec¬ 
tion 113131 on determining the best g for LabelRankT 
on Senate dataset while Q and Qs are not consis¬ 
tent with them. Further, we could observe that Q 
is always negative which indicates that LabelRankT 
performs below Estrangement over all g’s because 
the goal of Estrangement is to maximize modularity 
(Q). However, the other seven metrics imply that La¬ 
belRankT performs better than Estrangement when 
g > 0.1. Therefore, we could explicitly observe that 
maximizing Q to detect communities has problems in 
measuring the community detection quality correctly 
on Senate dataset. 

Table [TT] shows that six metrics get their best (largest 
or smallest) values at g = 0.6 while the two excep¬ 
tions, Q and the number of Intra-edges, reach their 
largest values when g = 0.5 and g = 0.7, respec¬ 
tively. Thus, the six metrics, except Q and the num¬ 
ber of Intra-edges, are consistent on determining the 
best value of g for LabelRankT on Reality Mining 
Bluetooth Scan data. This indicates that on Reality 
Mining Bluetooth Scan data, maximizing Q to detect 
communities has problems. 

It is also interesting to observe that for g = 0.05 
and g = 0.1 in Table m Inter-edges metric implies 
that LabelRankT performs better than Estrangement 
on Senate dataset, which is not consistent with Qs, 
Qds, Intra-edges, Contraction, Expansion, and Con¬ 
ductance metrics. Moreover, we could learn from Ta¬ 
ble [TT] that all the metrics, except Inter-edges met¬ 
ric, imply that LabelRankT performs slightly below 
the performance of Estrangement over all g’s. Thus, 
Inter-edges metric has some problems. Also, as men- 
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(a) Senate dataset {q = 0.7). (b) Reality Mining Bluetooth Scan data {q = 0.6). 

Figure 6: Qs of the community detection results of LabelRankT and Estrangement (also, the difference 
between LabelRankT and Estrangement) on (a) each snapshot of Senate dataset at g = 0.7 and on (b) each 
snapshot of Reality Mining Bluetooth Scan data with q = 0.6. 




(a) Senate dataset (q = 0.7). (b) Reality Mining Bluetooth Scan data (q = 0.6). 

Figure 7: The Modularity Density [Qds) of the community detection results of LabelRankT and Estrange¬ 
ment (also, the difference between LabelRankT and Estrangement) on (a) each snapshot of Senate dataset 
at g = 0.7 and on (b) each snapshot of Reality Mining Bluetooth Scan data with q = 0.6. 


tioned in the paragraph above, Intra-edges metric is 
not consistent with the other six metrics on deter¬ 
mining the best q for LabelRankT, which also means 
that Intra-edges metric has problems. We conjecture 
that the reason for the shortcoming of Intra-edges and 
Inter-edges metrics is the same as the case of mod¬ 
ularity {Q) which does not consider the number of 
nodes in the communities. This reason also implies 
the superiority of Qds over Q and Qs- 

Based on the results presented in the above two ta¬ 
bles, we conclude that Qds solves the two problems 
of modularity. We also conjecture that the difference 
between the best values of q for LabelRankT deter¬ 
mined by Q and Qs and the difference determined 
by Qs and Qds on Senate dataset is a manifesta¬ 
tion of the two problems of modularity maximization, 
namely favoring small communities and the resolu¬ 


tion limit problem. Moreover, the difference between 
the best values of q for LabelRankT determined by Q 
and Qs on Reality Mining Bluetooth Scan data indi¬ 
cates that maximizing Q has the problem of favoring 
small communities. Thus, Qs and Qds can be used 
for checking whether finding communities by maxi¬ 
mizing Q on a, specific dataset will suffer any of the 
two problems. 


To make the differences among Q, Qs, and Qds more 
clear, we plot their values, in Figures [SJ [51 and 13 of 
the community detection results of LabelRankT and 
Estrangement on each snapshot of Senate dataset at 
<7 = 0.7 and on each snapshot of Reality Mining Blue¬ 
tooth Scan data when q = 0.6. Figure 5(a) shows 


that in most cases Q is negative, while Qs and Qds 


are positive as seen in Figure 6(a) and Figure 7(a) 


It indicates that there is large difference between Q 
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and Qs or between Q and Qds- This is consistent 
with Table fTOl Further, it can be observed from Fig¬ 
ure [6(^ and Figure [7(a)] that Qs and Qds are almost 
the same on each snapshot, which is also consistent 
with Table [TUI Figure |5(b)[ Figure |6(b)[ and Fig¬ 


ure 


7(b) demonstrate that Q, Qs, and Qds are neg¬ 


ative in most of the cases, although their values are 
different in each snapshot. These observations are 
consistent with the results shown in Table [m 


V CONCLUSION AND FUTURE WORK 

In this paper, we propose a new community quality 
metric, called Modularity Density, which solves the 
problems of modularity of favoring small communi¬ 
ties in some circumstances and large communities in 
others. We demonstrate with proofs and experiments 
on real dynamic datasets that Modularity Density is 
an effective alternative to modularity. 

In the future, we plan to extend Modularity Den¬ 
sity to enable evaluation of the quality of overlap¬ 
ping community structures. We will also propose a 
community detection algorithm based on Modularity 
Density maximization and then compare its commu¬ 
nity detection results with those of modularity max¬ 
imization algorithms on some typical real networks. 
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